智能论文笔记

MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Pranjali Pathre , Anurag Sahu , Ashwin Rao , Avinash Prabhu , Meher Shashwat Nigam , Tanvi Karandikar , Harit Pandya , K. Madhava Krishna

分类：计算机视觉 | 机器人

2022-11-30

In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.

translated by 谷歌翻译

Adaptive ECCM for Mitigating Smart Jammers

Kunal Pattanayak , Shashwat Jain , Vikram Krishnamurthy , Chris Berry

分类：机器学习

2022-12-05

This paper considers adaptive radar electronic counter-counter measures (ECCM) to mitigate ECM by an adversarial jammer. Our ECCM approach models the jammer-radar interaction as a Principal Agent Problem (PAP), a popular economics framework for interaction between two entities with an information imbalance. In our setup, the radar does not know the jammer's utility. Instead, the radar learns the jammer's utility adaptively over time using inverse reinforcement learning. The radar's adaptive ECCM objective is two-fold (1) maximize its utility by solving the PAP, and (2) estimate the jammer's utility by observing its response. Our adaptive ECCM scheme uses deep ideas from revealed preference in micro-economics and principal agent problem in contract theory. Our numerical results show that, over time, our adaptive ECCM both identifies and mitigates the jammer's utility.

translated by 谷歌翻译

A comparative study of the performance of different search algorithms on FOON graphs

Kumar Shashwat

分类：机器人

2022-10-14

A robot finds it really hard to learn creatively and adapt to new unseen challenges. This is mainly because of the minimal information it has access to or experience towards. Paulius et al. [1] presented a way to construct functional graphs that encapsulate. Sakib et al. [2] further expanded FOON objects for robotic cooking. This paper presents a comparative study of Breadth First Search (BFS), Greedy Breadth First search (GBFS) with two heuristic functions, and Iterative Depth First Search (IDFS) and provides a comparison of their performance.

translated by 谷歌翻译

Shape Analysis for Pediatric Upper Body Motor Function Assessment

Shashwat Kumar , Robert Gutierez , Debajyoti Datta , Sarah Tolman , Allison McCrady , Silvia Blemker , Rebecca J. Scharf , Laura Barnes

分类：机器学习

2022-09-10

神经肌肉疾病，例如脊柱肌肉萎缩（SMA）和Duchenne肌肉营养不良症（DMD），导致6,000名儿童中有1例的渐进性肌肉变性和运动功能丧失。传统的上肢运动功能评估不能定量测量患者的性能，这使得很难跟踪进度的增量变化。评估神经肌肉疾病儿童的运动功能特别具有挑战性，因为他们在实验过程中可能会紧张或兴奋，或者简直太年轻而无法遵循精确的说明。这些挑战转化为混杂因素，例如执行臂卷曲的不同部分较慢或更快（相位变异性），从而影响评估的运动质量。本文使用曲线注册和形状分析来暂时对齐轨迹，同时提取平均参考形状。距这种平均形状的距离用于评估运动质量。所提出的指标是混杂因素（例如相位变异性）的不变性，同时提出了几种临床相关的见解。首先，控制和患者人群的功能分数在统计上存在显着差异（p $ = $ 0.0213 $ \ le $ 0.05）。接下来，患者队列中的几名患者能够与健康队列进行运动，反之亦然。我们的指标是根据可穿戴设备计算的，与Brooke的分数有关（（P $ = $ 0.00063 $ \ le $ $ 0.05））以及基于功能测定法的电动机功能评估（（P $ = $ = $ 0.0006 $ \ le $ 0.05））。这些结果表明了日常生活中无处不在的运动质量评估的希望。

translated by 谷歌翻译

Predictions of Reynolds and Nusselt numbers in turbulent convection using machine-learning models

Shashwat Bhattacharya , Mahendra K Verma , Arnab Bhattacharya

分类：机器学习

2022-01-10

在本文中，我们开发了多元回归模型和神经网络模型，以预测湍流热对流中的雷诺数（RE）和泡沫编号。我们将他们的预测与早期模型的对流模型进行比较：Grossmann-Lohse〜[物理。rev. lett。\ textbf {86}，3316（2001）]，修订了Grossmann-LoHse〜[phys。Fluids \ TextBF {33}，015113（2021）]和Pandey-Verma [物理。Rev. E \ TextBF {94}，053106（2016）]模型。我们观察到，尽管对所有模型的预测相互接近，但在本工作中开发的机器学习模型提供了与实验性和数值结果的最佳匹配。

translated by 谷歌翻译

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Ranjeet Ranjan Jha , Abhishek Bhardwaj , Devin Garg , Arnav Bhavsar , Aditya Nigam

分类：计算机视觉 | 机器学习

2021-12-27

休息状态FMRI通常用于通过使用基于网络的功能连接来诊断自闭症谱系期（ASD）。已经表明，ASD与大脑区域相关联及其连接。然而，基于控制群体的成像数据和ASD患者大脑的成像数据之间的判别是一种非琐碎的任务。为了解决上述分类任务，我们提出了一种新的深度学习架构（MHATC），包括多针关注和时间整合模块，用于将个体分类为ASD的患者。设计的架构是由对当前深度神经网络解决方案的局限性进行了深入分析了类似应用的局限性。我们的方法不仅坚固但计算效率，可以在各种其他研究和临床环境中采用它。

translated by 谷歌翻译

Semantic Segmentation of Legal Documents via Rhetorical Roles

Vijit Malik , Rishabh Sanjay , Shouvik Kumar Guha , Shubham Kumar Nigam , Angshuman Hazarika , Arnab Bhattacharya , Ashutosh Modi

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-03

法律文件是非结构化的，使用法律术语，并且具有相当长的长度，使得难以通过传统文本处理技术自动处理。如果文档可以在语义上分割成连贯的信息单位，法律文件处理系统将基本上受益。本文提出了一种修辞职位（RR）系统，用于将法律文件分组成语义连贯的单位：事实，论点，法规，问题，先例，裁决和比例。在法律专家的帮助下，我们提出了一套13个细粒度的修辞标志标签，并创建了与拟议的RR批发的新的法律文件有条件。我们开发一个系统，以将文件分段为修辞职位单位。特别是，我们开发了一种基于多任务学习的深度学习模型，文档修辞角色标签作为分割法律文件的辅助任务。我们在广泛地尝试各种深度学习模型，用于预测文档中的修辞角色，并且所提出的模型对现有模型显示出卓越的性能。此外，我们应用RR以预测法律案件的判断，并表明与基于变压器的模型相比，使用RR增强了预测。

translated by 谷歌翻译

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

David Junhao Zhang , Kunchang Li , Yunpeng Chen , Yali Wang , Shashwat Chandra , Yu Qiao , Luoqi Liu , Mike Zheng Shou

分类：计算机视觉

2021-11-24

自我关注已成为最近网络架构的一个组成部分，例如，统治主要图像和视频基准的变压器。这是因为自我关注可以灵活地模拟远程信息。出于同样的原因，研究人员最近使尝试恢复多层Perceptron（MLP）并提出一些类似MLP的架构，显示出极大的潜力。然而，当前的MLP样架构不擅长捕获本地细节并缺乏对图像和/或视频中的核心细节的逐步了解。为了克服这个问题，我们提出了一种新颖的Morphmlp架构，该架构专注于在低级层处捕获本地细节，同时逐渐改变，以专注于高级层的长期建模。具体地，我们设计一个完全连接的层，称为Morphfc，两个可变过滤器，其沿着高度和宽度尺寸逐渐地发展其接收领域。更有趣的是，我们建议灵活地调整视频域中的Morphfc层。为了我们最好的知识，我们是第一个创建类似MLP骨干的用于学习视频表示的骨干。最后，我们对图像分类，语义分割和视频分类进行了广泛的实验。我们的Morphmlp，如此自我关注的自由骨干，可以与基于自我关注的型号一样强大。

translated by 谷歌翻译

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Yuyin Zhou , Shih-Cheng Huang , Jason Alan Fries , Alaa Youssef , Timothy J. Amrhein , Marcello Chang , Imon Banerjee , Daniel Rubin , Lei Xing , Nigam Shah

分类：计算机视觉

2021-11-23

尽管辐射学家常规使用电子健康记录（EHR）数据来形成临床历史并通知图像解释，但医学成像的大多数深度学习架构是单向的，即，它们只能从像素级信息中学习特征。最近的研究揭示了如何从像素数据中恢复种族，仅突出显示模型中的严重偏差的可能性，这未能考虑人口统计数据和其他关键患者属性。然而，缺乏捕获临床背景的成像数据集，包括人口统计学和纵向病史，具有偏远的多式化医学成像。为了更好地评估这些挑战，我们呈现RadFusion，一种多式联运，基准数据集1794名患者的相应EHR数据和高分辨率计算断层扫描（CT）扫描标记为肺栓塞。我们评估了几个代表性的多模式融合模型，并在受保护的亚组中，例如性别，种族/种族，年龄的年龄。我们的研究结果表明，集成成像和EHR数据可以提高分类性能和鲁棒性，而不会在人口群之间的真正阳性率下引入大的差异。

translated by 谷歌翻译

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

Steve Yadlowsky , Scott Fleming , Nigam Shah , Emma Brunskill , Stefan Wager

分类： (统计)机器学习

2021-11-15

有许多可用于选择优先考虑治疗的可用方法，包括基于治疗效果估计，风险评分和手工制作规则的遵循申请。我们将秩加权平均治疗效应（RATY）指标作为一种简单常见的指标系列，用于比较水平竞争范围的治疗优先级规则。对于如何获得优先级规则，率是不可知的，并且仅根据他们在识别受益于治疗中受益的单位的方式进行评估。我们定义了一系列速率估算器，并证明了一个中央限位定理，可以在各种随机和观测研究环境中实现渐近精确的推断。我们为使用自主置信区间的使用提供了理由，以及用于测试关于治疗效果中的异质性的假设的框架，与优先级规则相关。我们对速率的定义嵌套了许多现有度量，包括QINI系数，以及我们的分析直接产生了这些指标的推论方法。我们展示了我们从个性化医学和营销的示例中的方法。在医疗环境中，使用来自Sprint和Accor-BP随机对照试验的数据，我们发现没有明显的证据证明异质治疗效果。另一方面，在大量的营销审判中，我们在一些数字广告活动的治疗效果中发现了具有的强大证据，并证明了如何使用率如何比较优先考虑估计风险的目标规则与估计治疗效益优先考虑的目标规则。

translated by 谷歌翻译